R, Programming assignment

Simulating pathways, mutual exclusivity, et al.

Authors:

Raquel Blanco Martínez-Illescas, Daniel Peñas Utrilla, Henry Secaira Morocho

Bioinformatics and Computational Biology, UAM

27/01/2021

Introduction

Cancer Progression Models

Cancer is an heterogeneous disease caused by the continuous accumulation of somatic mutations during lifetime of an individual. Somatic mutations that affect the cells are divided in:

  • Passenger mutations --> Silent mutations
  • Driver mutations --> Lead to cancer progression

Cancer Progression Models (cont.)

  • Tumor samples are obtained as cross-sectional samples --> single-time snapshots from tumors
  • Not all possible order in mutations are equally responsible for cancer progression --> infer order of accumulation of mutations leading to cancer progression. Order in accumulation of mutations is covered by Cancer Progression Models (CPM). They are depicted as Directed Acyclic Graphs (DAG), where nodes represent genes and arrows dependencies between them

Cristea et al., 2017

Evolutionary Models

  • Order of restrictions are encoded in DAGs

  • Mutations occur only if the preceding parent mutations have occurred (monotonicity)

  • Genotypes can follow different paths during disease progression

  • Evolutionary tumor progression models allows deviations from monotonicity

Diaz-Uriarte, R. (2018)

Evolutionary Models (cont.)

  • Fitness landscapes can be used to:
    • Visualize the possible paths of tumor progression
    • Identify genes that block those paths



Diaz-Uriarte, R. (2018)

Order of effects

  • The order in which somatic mutations are acquired influence clonal evolution

    • Mutations can behave as driver or passenger depending on the genetic context
  • Therefore, the fitness of a mutant depends in which mutations were acquired previously



Epistatic interactions

  • Cancer progression is driven by the accumulation of somatic mutations that interact epistatically

  • This is, their effect is non-additive to the tumor fitness as a phenotype

Synthetic Viability

  • The combination of two mutations that rescue the lethal effects of a each single mutation

Mutual Exclusivity

  • Common in cancer progresion: pathways
  • Null effect: a mutation that occurs first produces the most selective advantage

Mutual Exclusivity (cont.)

  • Synthetic lethality: combination of two mutations is detrimental for the viability of the cell, whereas each mutation is not

Frequency-dependent fitness

  • Coexistance of multiple clones in time

  • Fitness is actually a function of the relative frequency of other clones

PathTiMEx, a generative probabilistic graphical model of cancer progression.

PathTiMex is a probabilistic model of tumor progression among mutually exclusive driver pathways. This model was used to infer cancer progression in colorectal cancer.

Cristea et al., 2017

This generative model is mapped into an evolutionary model, where deviations from monotonicity are allowed.

Model

In [4]:
## Restriction table (extended version of the poset)
colcancer <- data.frame(
                 parent = c(rep("Root",3), "A", "B", "C"), ## Parent nodes
                 child = c("A", "B", "D", "C", "E", "E"), ## Child nodes
                 s = c(0.5, 0.2, 0.05, 0.1, rep(0.05, 2)), ## Restrictions are satisfied
                 sh = -0.3, ## Deviations from monotonicity (penalization)
                 typeDep = "MN" ## Type of dependency 
                  )
## Fitness specification of the poset
colcancer_efec <- allFitnessEffects(
                  colcancer, # Poset
                  geneToModule = c( ## Specification of the modules
                               "Root" = "Root", 
                               "A" = "APC",
                               "B" = "TP53, EVC2",
                               "C" = "KRAS",
                               "D" = "PI3KCA, EPHA",
                               "E" = "FBXW7, TCF7L2"),
                  drvNames = c( ## Specification of drivers
                               "APC", "TP53", "EVC2", "KRAS",
                               "PI3KCA", "EPHA", "FBXW7", "TCF7L2"))
In [3]:
## DAG representation
plot(colcancer_efec, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [5]:
# Evaluation of all genotypes and its fitness
colcancer_efec_FL <- evalAllGenotypes(colcancer_efec, max = 110000)
## Plot of fitness landscape
plotFitnessLandscape(colcancer_efec_FL)

Order Effects in cancer progression

Simplified model with 4 genes: APC, TP53, FBXW7 and KRAS.

In [4]:
# DAG from the simplified model
plot(cc_visuali, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Order Effects in cancer progression (cont.)

Mutation's order is inferred with pathTiMEx through the waiting time rate parameter. It informs about the mutation's arise. Fitness value associated to each order is derived from this parameter.

Gene/module Waiting time rate parameter
APC 9.5
KRAS 2.89
TP53, EVC2 1.92
PIK3CA, EPHA3 0.17
FBXW7, TCF7L2 0.08

The order in mutations: APC > TP53 > KRAS > FBXW7 is consistent with restrictions and is associated to the highest fitness value. Different order deviates from monotoniciy and lower fitness is associated.

Order Effects in cancer progression (cont.)

In [5]:
## Order effects
cc_order <- allFitnessEffects(
  orderEffects = c("A > B > C > D" = 0.5, 
                   "B > A > C > D" = 0.2,
                   "B > C > A > D" = 0.1,
                   "B > C > D > A" = 0.05,
                   "A > C" = 0.2,
                   "C > A" = 0.05,
                   "D > A" = 0.05,
                   "A > D" = 0.2,
                   "B > D" = 0.2,
                   "C > D" = 0.2,
                   "B > C" = 0.2,
                   "C > B" = 0.1,
                   "B > A" = 0.1,
                   "A > B" = 0.3),

  geneToModule =
    c("A" = "APC",
      "B" = "KRAS",
      "C" = "TP53",
      "D" = "FBXW7") )

Map Genotypes to Fitness

In [6]:
## Fitness associated to each genotype
(cc_order_geno <- evalAllGenotypes(cc_order, order = TRUE))
GenotypeFitness
APC 1,0000
FBXW7 1,0000
KRAS 1,0000
TP53 1,0000
APC > FBXW7 1,2000
APC > KRAS 1,3000
APC > TP53 1,2000
FBXW7 > APC 1,0500
FBXW7 > KRAS 1,0000
FBXW7 > TP53 1,0000
KRAS > APC 1,1000
KRAS > FBXW7 1,2000
KRAS > TP53 1,2000
TP53 > APC 1,0500
TP53 > FBXW7 1,2000
TP53 > KRAS 1,1000
APC > FBXW7 > KRAS 1,5600
APC > FBXW7 > TP53 1,4400
APC > KRAS > FBXW7 1,8720
APC > KRAS > TP53 1,8720
APC > TP53 > FBXW7 1,7280
APC > TP53 > KRAS 1,7160
FBXW7 > APC > KRAS 1,3650
FBXW7 > APC > TP53 1,2600
FBXW7 > KRAS > APC 1,1550
FBXW7 > KRAS > TP531,2000
FBXW7 > TP53 > APC 1,1025
FBXW7 > TP53 > KRAS1,1000
KRAS > APC > FBXW7 1,5840
KRAS > APC > TP53 1,5840
......
TP53 > APC > FBXW7 1,512000
TP53 > APC > KRAS 1,501500
TP53 > FBXW7 > APC 1,323000
TP53 > FBXW7 > KRAS 1,320000
TP53 > KRAS > APC 1,270500
TP53 > KRAS > FBXW7 1,584000
APC > FBXW7 > KRAS > TP532,246400
APC > FBXW7 > TP53 > KRAS2,059200
APC > KRAS > FBXW7 > TP532,695680
APC > KRAS > TP53 > FBXW74,852224
APC > TP53 > FBXW7 > KRAS2,471040
APC > TP53 > KRAS > FBXW72,965248
FBXW7 > APC > KRAS > TP531,965600
FBXW7 > APC > TP53 > KRAS1,801800
FBXW7 > KRAS > APC > TP531,663200
FBXW7 > KRAS > TP53 > APC1,455300
FBXW7 > TP53 > APC > KRAS1,576575
FBXW7 > TP53 > KRAS > APC1,334025
KRAS > APC > FBXW7 > TP532,280960
KRAS > APC > TP53 > FBXW73,284582
KRAS > FBXW7 > APC > TP531,995840
KRAS > FBXW7 > TP53 > APC1,746360
KRAS > TP53 > APC > FBXW72,634509
KRAS > TP53 > FBXW7 > APC2,200414
TP53 > APC > FBXW7 > KRAS2,162160
TP53 > APC > KRAS > FBXW72,594592
TP53 > FBXW7 > APC > KRAS1,891890
TP53 > FBXW7 > KRAS > APC1,600830
TP53 > KRAS > APC > FBXW72,195424
TP53 > KRAS > FBXW7 > APC1,920996

Map Genotypes to Fitness (cont.)

In [7]:
#DAG
plot(cc_order)
Error in `*tmp*`[[i]]: subíndice fuera de  los límites
Traceback:

1. plot(cc_order)
2. plot.fitnessEffects(cc_order)
3. plot(g1, edgeAttrs = list(arrowsize = a1, lty = s1, lwd = lwd, 
 .     color = c1), nodeAttrs = nAttrs)
4. plot(g1, edgeAttrs = list(arrowsize = a1, lty = s1, lwd = lwd, 
 .     color = c1), nodeAttrs = nAttrs)
5. .local(x, y, ...)
6. agopen(x, name = name, layout = TRUE, layoutType = y, attrs = attrs, 
 .     nodeAttrs = nodeAttrs, edgeAttrs = edgeAttrs, subGList = subGList, 
 .     recipEdges = recipEdges)
In [8]:
# Fitness landscape
plotFitnessLandscape(cc_order_geno)
Error in to_Fitness_Matrix(x, max_num_genotypes = max_num_genotypes): We cannot deal with order effects yet.
Traceback:

1. plotFitnessLandscape(cc_order_geno)
2. to_Fitness_Matrix(x, max_num_genotypes = max_num_genotypes)
3. stop("We cannot deal with order effects yet.")

Pathway Linear Progression Model

  • Generative probabilistic model

  • The order in which mutations arise are better described at the pathway level instead of a gene level

  • Model mapped the progression model from colorectal cancer data

Model

In [12]:
## Define poset restrictions, mapping of genes to modules, and driver genes
(CRC_W <- allFitnessEffects(data.frame(parent = c("Root", "A", "B", "C"),
                                      child = c("A", "B", "C", "D"),
                                      s = c(0.6, 0.4, 0.1, 0.05),
                                      sh = -0.5,
                                      typeDep = "MN"), 
                           geneToModule = c("Root" = "Root",
                                            "A" = "APC, EPHA3, TCF7L2",
                                            "B" = "EVC2, PIK3CA, TP53", 
                                            "C" = "KRAS",
                                            "D" = "FBXW7"),
                           drvNames = c("APC", "EPHA3", "TCF7L2", "EVC2", "PIK3CA", 
                                        "TP53", "KRAS", "FBXW7")))
$long.rt
$long.rt$A
$long.rt$A$child
[1] "A"

$long.rt$A$s
[1] 0.6

$long.rt$A$sh
[1] -0.5

$long.rt$A$typeDep
        MN 
"monotone" 

$long.rt$A$parents
[1] "Root"

$long.rt$A$childNumID
A 
1 

$long.rt$A$parentsNumID
Root 
   0 


$long.rt$B
$long.rt$B$child
[1] "B"

$long.rt$B$s
[1] 0.4

$long.rt$B$sh
[1] -0.5

$long.rt$B$typeDep
        MN 
"monotone" 

$long.rt$B$parents
[1] "A"

$long.rt$B$childNumID
B 
2 

$long.rt$B$parentsNumID
A 
1 


$long.rt$C
$long.rt$C$child
[1] "C"

$long.rt$C$s
[1] 0.1

$long.rt$C$sh
[1] -0.5

$long.rt$C$typeDep
        MN 
"monotone" 

$long.rt$C$parents
[1] "B"

$long.rt$C$childNumID
C 
3 

$long.rt$C$parentsNumID
B 
2 


$long.rt$D
$long.rt$D$child
[1] "D"

$long.rt$D$s
[1] 0.05

$long.rt$D$sh
[1] -0.5

$long.rt$D$typeDep
        MN 
"monotone" 

$long.rt$D$parents
[1] "C"

$long.rt$D$childNumID
D 
4 

$long.rt$D$parentsNumID
C 
3 



$long.epistasis
list()

$long.orderEffects
list()

$long.geneNoInt
data frame with 0 columns and 0 rows

$geneModule
    Gene Module GeneNumID ModuleNumID
1   Root   Root         0           0
2    APC      A         1           1
3  EPHA3      A         2           1
4   EVC2      B         3           2
5  FBXW7      D         4           4
6   KRAS      C         5           3
7 PIK3CA      B         6           2
8 TCF7L2      A         7           1
9   TP53      B         8           2

$gMOneToOne
[1] FALSE

$geneToModule
                Root                    A                    B 
              "Root" "APC, EPHA3, TCF7L2" "EVC2, PIK3CA, TP53" 
                   C                    D 
              "KRAS"              "FBXW7" 

$graph
IGRAPH c356bc7 DN-- 5 4 -- 
+ attr: name (v/c), typeDep (e/c), color (e/c), lty (e/n), arrow.mode
| (e/n)
+ edges from c356bc7 (vertex names):
[1] Root->A A   ->B B   ->C C   ->D

$drv
[1] 1 2 3 4 5 6 7 8

$rT
  parent child    s   sh typeDep
1   Root     A 0.60 -0.5      MN
2      A     B 0.40 -0.5      MN
3      B     C 0.10 -0.5      MN
4      C     D 0.05 -0.5      MN

$epistasis
NULL

$orderEffects
NULL

$noIntGenes
NULL

$fitnessLandscape
     [,1]

$fitnessLandscape_df
data frame with 0 columns and 0 rows

$fitnessLandscape_gene_id
data frame with 0 columns and 0 rows

$fitnessLandscapeVariables
character(0)

$frequencyDependentFitness
[1] FALSE

$frequencyType
[1] "freq_dep_not_used"

attr(,"class")
[1] "fitnessEffects"

DAG

In [6]:
# DAG representation
plot(CRC_W, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [13]:
## Map genotypes to fitness
CRC_F <- evalAllGenotypes(CRC_W, order = FALSE, addwt = TRUE)

## Plot of fitness landscape

plot(CRC_F)

Synthetic Lethality via Three-way Interaction

  • Restrictions imposed with XOR

  • Fitness values that reflect slightly deleterious effect (if two genes appear)

  • or a highly deleterious effect (if three genes appear).

In [3]:
# DAG representation
plot(CRC_W4, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [19]:
## Plot of fitness landscape

plot(CRC_F2, use_ggrepel = TRUE)

Synthetic Viability with a Three-way Interaction

  • Highly deleterious effects if APC, TP53, or KRAS appear independently

  • Slightly deleterious effects were set if two of those genes appear in a genotype

  • Benefitial effect for the genotype of the three genes

  • Restrictions imposed with SM

In [3]:
# DAG representation
plot(CRC_W6, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [22]:
## Plot of fitness landscape

plot(CRC_F4, use_ggrepel = TRUE)

A Probabilistic Model of Mutually Exclusive Linearly Ordered Driver Pathways (Mohaghegh Neyshabouri et al.)

This model assumes driver genes are over-represented among those mutated across a large tumor collection and, thus, they can be identified in terms of frequency. Also, those participating of the same pathway are mutated in a mutually exclusive manner because more than one mutation in a pathway does not give any selective advantage to the clone. analyze one large dataset of colorectal adenocarcinoma (COADREAD) from IntOGen-mutations database.

Model

In [23]:
## Restriction table, including DAG of restrictions specifications and associated fitness
COADREAD_rT <- data.frame(
              parent = c("Root", "A", "B", "C", "D", "E", "F"), # Parent nodes
              child = c("A", "B", "C", "D", "E", "F", "G"), # Child nodes
                             s = 0.5, 
                             sh = c(rep(-1, 4), rep(-.5, 2), -.2),
                             typeDep = "MN")


## Create fitness specifications from DAG of restrictions considering modules 
COADREAD_fitness <- allFitnessEffects(
              COADREAD_rT, 
              geneToModule = c( 
                 "Root" = "Root",
                 "A" = "APC",
                 "B" = "TP53",
                 "C" = "KRAS",
                 "D" = "PIK3CA, NRAS",
                 "E" = "FBXW7, ARID1A",
                 "F" = "ATM, SMAD2",
                 "G" = "SOX9, SMAD4")) # Modules

Map Genotypes to Fitness

In [24]:
## Evaluation of all possible genotypes fitness 
## under the previous fitness specifications
COADREAD_FL <- evalAllGenotypes(COADREAD_fitness, max = 131072)

## Fitness landscape representation
plotFitnessLandscape(COADREAD_FL)

Simplified Model

In [3]:
## Simplified DAG of restrictions representation
plot(COADREAD_fitness_5d, expandModules = TRUE, autofit = TRUE)

Map Genotypes to Fitness

In [28]:
## Fitness landscape representation
plotFitnessLandscape(COADREAD_FL_5d, use_ggrepel = TRUE)
Warning message:
“ggrepel: 21 unlabeled data points (too many overlaps). Consider increasing max.overlaps”

Frequency-dependent fitness

Clones that coexist in a tumor can influence the fitness of each other in a frequency-dependent manner.

Frequency-dependent fitness (cont.)

In [29]:
## Mapping of genotypes to frequency-dependent fitness
# Not explicitly mapped genotypes are assigned a fitness of cero
COADREAD_gen_freqdep <- data.frame(
                Genotype = c("WT", "APC","APC, TP53",
                            "APC, TP53, KRAS",
                            "APC, TP53, KRAS, PIK3CA",
                            "APC, TP53, KRAS, NRAS",
                            "APC, TP53, KRAS, PIK3CA, NRAS"),
                            Fitness = c("1", "1.5",
                                        "2.25", "3.375", "5.0625", "5.0625",
              "5.0625 - ((f_APC_TP53_KRAS_PIK3CA + f_APC_TP53_KRAS_NRAS))/2"),
                            stringsAsFactors = FALSE)

## Fitness specifications
COADREAD_fitness_freqdep <- allFitnessEffects(genotFitness = COADREAD_gen_freqdep, 
                                                   frequencyDependentFitness = TRUE,
                                                   frequencyType = "rel")

Map Genotypes to Fitness

Map Genotypes to Fitness (cont.)

In [31]:
## Fitness landscape representation
plotFitnessLandscape(COADREAD_FL_freqdep)

Conclusions

  • We have mapped various cancer progression models (mutual exclusivity assumptions) to different evolutionary models and analyzed the resulting fitness landscapes

  • All this evidence supports the idea that cancer progression is better explained through an evolutionary perspective

  • Current cancer progression models cannot capture these phenomena and, thus, conclusions derived from them should be questioned

  • OncoSimulR is a very complete and convenient clone-based genetic simulator to evaluate cancer progression under evolutionary models, but is limited to:

    • Small sets of genes for fitness landscapes interpretation.
    • No fitness landscape for studying order of effects.
    • Modules do not allow to define epistatic relationships between genes of the same module.

References

  1. Raphael BJ, Vandin F. Simultaneous Inference of Cancer Pathways and Tumor Progression from Cross-Sectional Mutation Data. Journal of Computational Biology. 2015;22(6):510–27. doi: 10.1089/cmb.2014.0161
  2. Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics. 2020;36(1):241–9. doi: 10.1093/bioinformatics/btz513
  3. Sprouffske K, Pepper JW, Maley CC. Accurate reconstruction of the temporal order of mutations in neoplastic progression. Cancer Prevention Research. 2011;4(7):1135–44. doi: 10.1158/1940-6207.CAPR-10- 0374
  4. Pon JR, Marra MA. Driver and passenger mutations in cancer. Annual Review of Pathology: Mechanisms of Disease. 2015; doi: 10.1146/annurev-pathol-012414-040312
  5. Diaz-Uriarte R. Cancer progression models and fitness landscapes: A many-to-many relationship. Bioinformatics. 2018;34(5):836–44. doi: 10.1093/bioinformatics/btx663
  6. Tomasetti C, Vogelstein B, Parmigiani G. Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proceedings of the National Academy of Sciences of the United States of America. 2013; doi: 10.1073/pnas.1221068110
  7. Nowell PC. The clonal evolution of tumor cell populations. Science. 1976; doi: 10.1126/science.959840
  8. Lee EYHP, Muller WJ. Oncogenes and tumor suppressor genes. 2010. doi: 10.1101/cshperspect.a003236
  9. Diaz-Uriarte R. Identifying restrictions in the order of accumulation of mutations during tumor progression: Effects of passengers, evolutionary models, and sampling. BMC Bioinformatics. 2015;16(1):1–26. doi: 10.1186/s12859-015-0466-7
  10. Cristea S, Kuipers J, Beerenwinkel N. PathTiMEx: Joint Inference of Mutually Exclusive Cancer Pathways and Their Progression Dynamics. Journal of Computational Biology. 2017;24(6):603–15. doi: 10.1089/cmb.2016.0171
  11. Neyshabouri MM, Jun SH, Lagergren J. Inferring tumor progression in large datasets. PLoS Computational Biology. 2020;16(10):1–16. Available from: http://dx.doi.org/10.1371/journal.pcbi.1008183
  12. Ortmann CA, Kent DG, Nangalia J, Silber Y, Wedge DC, Grinfeld J, et al. Effect of Mutation Order on Myeloproliferative Neoplasms. New England Journal of Medicine. 2015 Feb;372(7):601–12. [accessed 20 Jan 2021] Available from: https://doi.org/10.1056/NEJMoa1412098
  13. Diaz-Uriarte R. OncoSimulR: Genetic simulation with arbitrary epistasis and mutator genes in asexual populations. Bioinformatics. 2017;33(12):1898–9. doi: 10.1093/bioinformatics/btx077
  14. Wang X, Fu AQ, McNerney ME, White KP. Widespread genetic epistasis among cancer genes. Nature Communications. 2014 Nov;5(1):4828. [accessed 20 Jan 2021] Available from: https://www.nature.com/ articles/ncomms5828
  15. Haar J van de, Canisius S, Yu MK, Voest EE, Wessels LFA, Ideker T. Identifying Epistasis in Cancer Genomes: A Delicate Affair. Cell. 2019 May;177(6):1375–83. [accessed 20 Jan 2021] Available from: http: //www.sciencedirect.com/science/article/pii/S0092867419305033
  16. Reia SM, Campos PRA. Analysis of statistical correlations between properties of adaptive walks in fitness landscapes. Royal Society Open Science. 7(1):192118. [accessed 20 Jan 2021] Available from: https: //royalsocietypublishing.org/doi/10.1098/rsos.192118
  17. Gu Y, Wang R, Han Y, Zhou W, Zhao Z, Chen T, et al. A landscape of synthetic viable interactions in cancer. Briefings in Bioinformatics. 2018 Jul;19(4):644–55. [accessed 20 Jan 2021] Available from: https://doi.org/10.1093/bib/bbw142
  18. Archetti M, Pienta KJ. Cooperation among cancer cells: Applying game theory to cancer. Nature Reviews Cancer. 2019;19(2):110–7.
  19. Gerstung M, Eriksson N, Lin J, Vogelstein B, Beerenwinkel N. The temporal order of genetic and pathway alterations in tumorigenesis. PLoS ONE. 2011;6(10). doi: 10.1371/journal.pone.0027136
  20. Ciriello G, Cerami E, Sander C, Schultz N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Research. 2012; doi: 10.1101/gr.125567.111
  21. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, et al. The genomic landscapes of human breast and colorectal cancers. Science. 2007; doi: 10.1126/science.1145720
  22. Beerenwinkel N, Antal T, Dingli D, Traulsen A, Kinzler KW, Velculescu VE, et al. Genetic progression and the waiting time to cancer. PLoS Computational Biology. 2007; doi: 10.1371/journal.pcbi.0030225
  23. Yeang C-H, McCormick F, Levine A. Combinatorial patterns of somatic gene mutations in cancer. The FASEB Journal. 2008; doi: 10.1096/fj.08-108985